NIL Is Not Nothing: Recognition of Chinese Network Informal Language Expressions
نویسندگان
چکیده
Informal language is actively used in network-mediated communication, e.g. chat room, BBS, email and text message. We refer the anomalous terms used in such context as network informal language (NIL) expressions. For example, “ (ou3)” is used to replace “ (wo3)” in Chinese ICQ. Without unconventional resource, knowledge and techniques, the existing natural language processing approaches exhibit less effectiveness in dealing with NIL text. We propose to study NIL expressions with a NIL corpus and investigate techniques in processing NIL expressions. Two methods for Chinese NIL expression recognition are designed in NILER system. The experimental results show that pattern matching method produces higher precision and support vector machines method higher F-1 measure. These results are encouraging and justify our future research effort in NIL processing.
منابع مشابه
A Two-Stage Incremental Annotation Approach to Constructing a Network Informal Language Corpus
Network Informal Language (NIL) refers to the special human language widely used in the community of digital network chat via platforms such as chat rooms/tools, mobile phone short message services (SMS), bulletin board systems (BBS), emails, etc. NIL holds anomalous characteristics in forming words, phrases, and non-alphabetical characters. This makes it difficult to handle NIL text by convent...
متن کاملPragmatic expressions in cross-linguistic perspective
This paper focuses on some pragmatic expressions that are characteristic of informal spoken English, their possible equivalents in some other languages, and their use by EFL learners from different backgrounds. These expressions, called general extenders (e.g. and stuff, or something), are shown to be different from discourse markers and to exhibit variation in form, funct...
متن کاملسیستم شناسایی و طبقهبندی موجودیتهای اسمی در متون زبان فارسی بر پایه شبکه عصبی
Named Entity Recognition (NER) is a fundamental task in natural language processing and also known as a subset of information extraction. We seek to locate and classify named entities in text into predefined categories such as the names of persons, organizations, locations, expressions of times, etc. Named Entity Recognition for English texts has been researched widely for the past years, howev...
متن کاملMining Informal Language from Chinese Microtext: Joint Word Recognition and Segmentation
We address the problem of informal word recognition in Chinese microblogs. A key problem is the lack of word delimiters in Chinese. We exploit this reliance as an opportunity: recognizing the relation between informal word recognition and Chinese word segmentation, we propose to model the two tasks jointly. Our joint inference method significantly outperforms baseline systems that conduct the t...
متن کاملSpatial and symbolic recognition of Chinese mosques
The history of Islam in China began when the first ambassador of Islamic caliphate in 654 AD, gained the court of the Chinese emperor. After that Islam has been spread throughout there during a century. In this study, authors try to study about how architectural elements and spatial forms are effected from Islam or Buddhist-Chinese tradition. Then, at the first it must be clear that which symbo...
متن کامل